Database Partitioning Strategies for Social Network Data

نویسندگان

  • Oscar Ricardo Moll Thomae
  • Samuel R. Madden
  • Dennis M. Freeman
چکیده

In this thesis, I designed, prototyped and benchmarked two different data partitioning strategies for social network type workloads. The first strategy takes advantage of the heavy-tailed degree distributions of social networks to optimize the latency of vertex neighborhood queries. The second strategy takes advantage of the high temporal locality of workloads to improve latencies for vertex neighborhood intersection queries. Both techniques aim to shorten the tail of the latency distribution, while avoiding decreased write performance or reduced system throughput when compared to the default hash partitioning approach. The strategies presented were evaluated using synthetic workloads of my own design as well as real workloads provided by Twitter, and show promising improvements in latency at some cost in system complexity. Thesis Supervisor: Stu Hood Title: Engineer at Twitter Thesis Supervisor: Samuel R. Madden Title: Associate Professor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases

Social networks are large graphs that require multiple graph database servers to store and manage them. Each database server hosts a graph partition with the objectives of balancing server loads, reducing remote traversals (edge-cuts), and adapting the partitioning to changes in the structure of the graph in the face of changing workloads. To achieve these objectives, a dynamic repartitioning a...

متن کامل

Partitioning Graph Databases - A Quantitative Evaluation

The amount of globally stored, electronic data is growing at an increasing rate. This growth is both in size and connectivity, where connectivity refers to the increasing presence of, and interest in, relationships between data [12]. An example of such data is the social network graph created and stored by Twitter [2]. Due to this growth, demand is increasing for technologies that can process s...

متن کامل

Designing a trust-based recommender system in Social Rating Networks

One of the most common styles of business today is electronic business, since it is considered as a principal mean for financial transactions among advanced countries. In view of the fact that due to the evolution of human knowledge and the increase of expectations following that, traditional marketing in electronic business cannot meet current generation’s needs, in order to survive, organizat...

متن کامل

An Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling

In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...

متن کامل

Schism: a Workload-Driven Approach to Database Replication and Partitioning

We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of sharednothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partition...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012